Random Forest for Bioinformatics
نویسنده
چکیده
Modern biology has experienced an increasing use of machine learning techniques for large scale and complex biological data analysis. In the area of Bioinformatics, the Random Forest (RF) [6] technique, which includes an ensemble of decision trees and incorporates feature selection and interactions naturally in the learning process, is a popular choice. It is nonparametric, interpretable, efficient, and has a high prediction accuracy for many types of data. Recent work in computational biology has shown an increased use of random forest, owing to its unique advantages in dealing with small sample size, high-dimensional feature space, and complex data structures. The aim of this chapter is two-fold. First, to provide a review of notable extensions of random forest in bioinformatics, whereby promising direction such as RF based feature selection is discussed. Second, to briefly introduce the applications of random forest and its extensions. RF has been applied in a broad spectrum of biological tasks, including, for example; classifying different types of samples using gene expression of microarrays data, identifying disease associated genes from genome wide association studies, recognizing the important elements in protein sequences, or identifying protein-protein interactions.
منابع مشابه
IntegratedMRF: random forest-based framework for integrating prediction from different data types
Summary IntegratedMRF is an open-source R implementation for integrating drug response predictions from various genomic characterizations using univariate or multivariate random forests that includes various options for error estimation techniques. The integrated framework was developed following superior performance of random forest based methods in NCI-DREAM drug sensitivity prediction challe...
متن کاملApproximate False Positive Rate Control in Selection Frequency for Random Forest
Random Forest has become one of the most popular tools for feature selection. Its ability to deal with high-dimensional data makes this algorithm especially useful for studies in neuroimaging and bioinformatics. Despite its popularity and wide use, feature selection in Random Forest still lacks a crucial ingredient: false positive rate control. To date there is no efficient, principled and comp...
متن کاملGuided Random Forest in the RRF Package
Summary: Random Forest (RF) is a powerful supervised learner and has been popularly used in many applications such as bioinformatics. In this work we propose the guided random forest (GRF) for feature selection. Similar to a feature selection method called guided regularized random forest (GRRF), GRF is built using the importance scores from an ordinary RF. However, the trees in GRRF are built ...
متن کاملL1-based compression of random forest models
High-dimensional supervised learning problems, e.g. in image exploitation and bioinformatics, are more frequent than ever. Tree-based ensemble methods, such as random forests (Breiman, 2001) and extremely randomized trees (Geurts et al., 2006), are effective variance reduction techniques offering in this context a good trade-off between accuracy, computational complexity, and interpretability.
متن کاملEM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
MOTIVATION We developed an EM-random forest (EMRF) for Haseman-Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of corr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011